This project was completed as a Capstone Project by University of Pennsylvania graduate student Sean McClellan. The Master’s of Urban Spatial Analytics (MUSA) program is a one year program focused on data science, spatial analytics, statistics, GIS, and data visualization.
More information on the program can be found here. More of Sean’s work can be found here.
NFL Football is a sport filled with passion and aggression. These qualities are shared by the fans. While passion at the stadium can make for a fun and competitive atmosphere, these qualities can just as easily result in antinormative behavior such as crime. For instance, assaults, vandalism, arrests for disorderly conduct, and arrests for alcohol-related offenses are all common arrests on football game days according to Rees & Schnepel (2009) which Deindividuation Theory would attribute to due to large crowds and stadium atmospheres. These settings and circumstances contribute to a collective mind, which can cause individuals to lose the ability of evaluating themselves or their actions and result in irrational or irresponsible behavior.
Exploring crime incidents across time and space in cities with professional football teams answers questions about the effect of NFL football on crime. Furthermore, patterns of spatial distribution and fluctuations in offense type totals can be connected to the result of that city’s team.
Research has been conducted on this topic in the past, however most of this research has been strictly on levels of crime and included no spatial analysis. My project would fill this void by observing clustering and hot/cold spots of crime in each city. While the analysis will include basic measurements and comparisons of crime that is covered in previous research, this hot spot analysis will add a new component to NFL-related crime.
I hypothesize that crime such as theft and assault will increase in density or cluster closer around each team’s stadium on game days. In addition to the spatial components of this analysis, it would be interesting to view scatter plots of points allowed by the home team and counts of crime, specifically violent crimes, to see if there is a correlation between performance and criminality.
Literature review
This analysis will complete an in-depth analysis of four NFL teams from the NFC East Division based on the 2019 regular season: The Dallas Cowboys, New York Giants, Philadelphia Eagles, and the Washington Redskins (now the Commanders). Crime will be analyzed in the cities of these four teams with specific tests and figures showing concentration and clustering of crime around the stadium, bars, subway stops, and other local features. These crime levels and patterns will then be compared between each city, painting a picture of how these teams affect their cities.
The methodology of this analysis will first include the gathering and cleaning of data. Working the datasets into desired formats will be vital to ensure data is ready for analysis despite being from many different sources. Then, data will need to be separated into home games and away games for each team. This will be done by selecting only the observations in which the desired team is the home team. Once the NFL games have been wrangled, city/regional boundaries and stadium locations can be plotted with density of crimes for both gamedays and non-gamedays.
Importing raw data for wrangling and cleaning includes reading basic NFL data which holds information on historical NFL games, scores, betting information, and more.
nfl <- read.csv("spreadspoke_scores.csv") %>%
filter(schedule_season == "2019")
datatable(nfl, options = list(pageLength = 10,scrollX='400px'))
Further data wrangling and cleaning can be completed by filtering this data to only include the teams included in this analysis. For instance, here is data extracted from the previous NFL data for only the Philadelphia Eagles.
eagles <- subset(nfl, team_home == "Philadelphia Eagles" | team_away == "Philadelphia Eagles") %>%
mutate(date = mdy(schedule_date))
datatable(eagles)
These scores and game details will become useful later into the analysis. Crime data must be imported so game day crimes can be spatially compared to non-gameday crimes. For now, let’s take a look at crime acorss the city for the entire year of 2019:
phlCrime <- read_sf("Philadelphia Eagles/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
df_sub <- phlCrime[1:100,] # display the first 100 rows
df_sub$dispatch_t <- as.character(df_sub$dispatch_t)
datatable(df_sub, options = list(pageLength = 5,scrollX='400px'))
data <- phlCrime[1:10000,] # display the first 10,000 rows
data$popup <- paste("<b>Incident #: </b>", data$objectid,
"<br>", "<b>Description: </b>", data$text_gener,
"<br>", "<b>Date: </b>", data$dispatch_d,
"<br>", "<b>Time: </b>", data$dispatch_t,
"<br>", "<b>Longitude: </b>", data$point_x,
"<br>", "<b>Latitude: </b>", data$point_y)
leaflet(data, width = "100%") %>% addTiles() %>%
addTiles(group = "OSM (default)") %>%
addProviderTiles(provider = "Esri.WorldStreetMap",
group = "World StreetMap") %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addMarkers(lng = ~point_x, lat = ~point_y, popup = data$popup,
clusterOptions = markerClusterOptions()) %>%
addLayersControl(
baseGroups = c("Positron (default)","World StreetMap"),
options = layersControlOptions(collapsed = FALSE))
The figure above shows crime at multiple elevations. At the highest elevation, areas around Center City possess a staggering amount of crimes compared to the outer parts of the city. However, crime was a citywide issue in 2019.
crime_cats <- phlCrime %>%
mutate(category = case_when(
text_gener == "Thefts" ~ "Thefts",
text_gener == "Theft from Vehicle" ~ "Thefts",
text_gener == "Aggravated Assault No Firearm" ~ "Violent",
text_gener == "Burglary Residential" ~ "Violent",
text_gener == "Robbery No Firearm" ~ "Violent",
text_gener == "Burglary Non-Residential" ~ "Violent",
text_gener == "Robbery Firearm" ~ "Violent",
text_gener == "Rape" ~ "Rape",
text_gener == "Aggravated Assault Firearm" ~ "Violent",
text_gener == "Other Assaults" ~ "Violent",
text_gener == "Narcotic / Drug Law Violations" ~ "Alcohol/Narcotics",
text_gener == "Weapon Violations" ~ "All Other Offenses",
text_gener == "All Other Offenses" ~ "All Other Offenses",
text_gener == "Vandalism/Criminal Mischief" ~ "All Other Offenses",
text_gener == "DRIVING UNDER THE INFLUENCE" ~ "Alcohol/Narcotics",
text_gener == "Fraud" ~ "Financial",
text_gener == "Forgery and Counterfeiting" ~ "Financial",
text_gener == "Embezzlement" ~ "Financial",
text_gener == "Disorderly Conduct" ~ "All Other Offenses",
text_gener == "Arson" ~ "All Other Offenses",
text_gener == "Offenses Against Family and Children" ~ "All Other Offenses",
text_gener == "Other Sex Offenses (Not Commercialized)" ~ "Sex",
text_gener == "Prostitution and Commercialized Vice" ~ "Sex",
text_gener == "Public Drunkenness" ~ "Alcohol/Narcotics",
text_gener == "Liquor Law Violations" ~ "Alcohol/Narcotics",
text_gener == "Gambling Violations" ~ "All Other Offenses",
text_gener == "Receiving Stolen Property" ~ "All Other Offenses",
text_gener == "Vagrancy/Loitering" ~ "All Other Offenses",
text_gener == "Homicide - Criminal" ~ "Homicide",
text_gener == "Motor Vehicle Theft" ~ "Thefts"
))
phlCrime_counts <- crime_cats %>%
group_by(category) %>%
summarize(count=n()) %>%
st_drop_geometry()
ggplot(data=phlCrime_counts,aes(x=category, y=count)) +
geom_bar(stat="identity", width=0.5) +
labs(title = "2019 Crime by Offense Category", subtitle = "Philadelphia") +
plotTheme
The types of crime, seen above, with the highest number of offenses were violent crimes and thefts. According to the research by Rees and Schnepel, thefts are known to increase during NFL home games. Where these crimes occur hold just as much importance as their offense category. Aggregating criminal incidents from 2019 to a fishnet grid gives an idea of where the most crime is occurring, similar to the interactive map above.
phlBound <-
st_read("http://data.phl.opendata.arcgis.com/datasets/063f5f85ef17468ebfebc1d2498b7daf_0.geojson") %>%
st_transform('EPSG:3857')
## Reading layer `abf6322f-732c-41eb-be2f-a687a95d16462020329-1-11fkgla.x0gkf' from data source `http://data.phl.opendata.arcgis.com/datasets/063f5f85ef17468ebfebc1d2498b7daf_0.geojson'
## using driver `GeoJSON'
## Simple feature collection with 288 features and 4 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -75.28031 ymin: 39.86771 xmax: -74.95575 ymax: 40.13793
## Geodetic CRS: WGS 84
phlOutline <- st_union(phlBound) %>%
st_transform('EPSG:3857')
fishnetHexPhl <-
st_make_grid(phlBound,
cellsize = 750,
square = FALSE) %>%
.[phlBound] %>%
st_sf() %>%
mutate(uniqueID = rownames(.))
crime_netHex <-
dplyr::select(phlCrime) %>%
mutate(countCrime = 1) %>%
aggregate(., fishnetHexPhl, sum) %>%
mutate(countCrime = replace_na(countCrime, 0),
uniqueID = rownames(.),
cvID = sample(round(nrow(fishnetHexPhl) / 24),
size=nrow(fishnetHexPhl), replace = TRUE))
ggplot() +
geom_sf(data = crime_netHex, aes(fill = countCrime), color = "transparent", size = 0) +
scale_fill_viridis()+
labs(title = "Count of Crimes within Hex Grid",
subtitle = "Philadelphia, PA")
crime_joined <- left_join(phlCrime, eagles, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
PLotting crime as point data, representative of each individual incident, provides a visual understanding that does not capture the true amount of events in a given area. Plotting the kernel density, however, shows the same data while offering a better understanding of the spatial distribution of these incidents.
# Plotting Crime for away/home/no game
home_away <- ggplot() +
geom_sf(data = crime_joined, aes(color = home_game), size = 0.5, alpha = 0.3) +
geom_sf(data = phlOutline, fill = "transparent", color = "black", size = 1) +
mapTheme +
theme(legend.position = 'none') +
facet_wrap(~home_game)
home_away_density <- ggplot() +
stat_density2d(data = crime_joined,
aes(point_x, point_y, fill = ..level.., alpha = ..level..),
geom = "polygon",
size = 0.1,
bins = 40) +
# geom_sf(data = phlOutline, fill = "transparent", color = "black", size = 1) +
scale_fill_viridis_c() +
mapTheme +
theme(legend.position = 'none') +
facet_wrap(~home_game)
grid.arrange(home_away, home_away_density, nrow = 2)
Separating crime into three categories will be the basis of the next component of analysis. Here, crimes will be aggregated into three types of days: days with no Eagle’s game (NA), days on which the Eagle’s played at home(1), and days on which the Eagle’s played an away game(0).
ggplot() +
geom_sf(data = crime_joined) +
facet_wrap(~home_game)
Again, point data is not the best representation of this data, so it is aggregated to a fishnet for both away and home games.
# plot separate fishnet maps for away games and home games
away <- subset(crime_joined, away_game == 1)
home <- subset(crime_joined, home_game == 1)
hex_grid_away <-
dplyr::select(away) %>%
mutate(countCrime = 1) %>%
aggregate(., fishnetHexPhl, sum) %>%
mutate(countCrime = replace_na(countCrime, 0),
uniqueID = rownames(.),
cvID = sample(round(nrow(fishnetHexPhl) / 24),
size=nrow(fishnetHexPhl), replace = TRUE))
away_plot <- ggplot() +
geom_sf(data = hex_grid_away, aes(fill = countCrime), color = "transparent", size = 0) +
scale_fill_viridis()+
labs(title = "Away Games",
subtitle = "Philadelphia Eagles - 2019") +
mapTheme
hex_grid_home <-
dplyr::select(home) %>%
mutate(countCrime = 1) %>%
aggregate(., fishnetHexPhl, sum) %>%
mutate(countCrime = replace_na(countCrime, 0),
uniqueID = rownames(.),
cvID = sample(round(nrow(fishnetHexPhl) / 24),
size=nrow(fishnetHexPhl), replace = TRUE))
home_plot <- ggplot() +
geom_sf(data = hex_grid_home, aes(fill = countCrime), color = "transparent", size = 0) +
scale_fill_viridis()+
labs(title = "Home Games",
subtitle = "Philadelphia Eagles - 2019") +
mapTheme
grid.arrange(home_plot, away_plot, nrow = 1)
While these fishnets only account for crime from one year, there is a visible difference in crime location on away gamedays than home gamedays.